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METHOD FOR DIRECT MEMORY ACCESS, RELATED ARCHITECTURE AND 

COMPUTER PROGRAM PRODUCT 

CROSS REFERENCE TO RELATED APPLICATIONS 
This application is the US national phase of PCT 
5 application PCT/EP2002/013847, filed 6 December 2002, published 
24 June 2004 as WO 2004/053708, and claiming the priority of PCT 
patent application PCT/EP2002/013847 itself filed 6 December 
2002, whose entire disclosures are herewith incorporated by 
reference. 

10 FIELD OF THE INVENTION 

The present invention relates to techniques for direct 
memory access (DMA) . 

BACKGROUND OF THE INVENTION 
As depicted in FIG. 1, a typical System On Chip 
is arrangement for direct memory access requires a CPU to 

communicate with a number of blocks (intellectual properties or 
IPs) generically called A, B, C, which can be connected together. 
In such a prior art arrangement data transfer from A to B and 
from B to C is scheduled by the CPU that monitors the state of 
20 the ongoing processes on the basis of interrupts from the blocks. 

A wide variety of possible variants of such a basic 
arrangement are known in the art. 

For instance # in US-A-4 481 578 a DMA arrangement is 
disclosed including a DMA controller connected to each of a 
25 plurality of processors to facilitate transfer of bulk data from 
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one memory to the other without the intervention of either or 
both processors. 

In US-A-5 212 795 a programmable DMA controller 
arrangement is disclosed for regulating access of each of a 
5 number of I/O devices to a bus. The arrangement includes a 

priority register storing priorities of bus access from the I/O 
devices, an interrupt register storing bus access requests of the 
I/O devices, a resolver for selecting one of the I/O devices to 
have access to the bus, a pointer register storing addresses of 

10 locations in a memory for communication with the one I/O device 

via the bus, a sequence register storing an address of a location 
in the memory containing a channel program instruction which is 
to be executed next, an ALU for incrementing or decrementing 
addresses stored in the pointer register, computing the next 

15 address to be stored in the sequence -register-and computing an 
initial contents—of -the registers. 

In US-A-5 634 099 another DMA arrangement (designated a 
Direct Access Memory Unit or DAU) is disclosed wherein the CPU 
requests a DMA by writing information relevant to the DMA to a 

20 remote processor's memory. The CPU can abort a pending DMA 

request during DAU operations by setting a skip bit in a control 
block, while an interrupt can also be sent to the CPU wherein the 
CPU is advised that a DMA request has been completed. 

In US-B-6 341 328 multiple co-pendent DMA controllers 

25 are provided to read and write common data blocks to two 

peripheral devices. As a result, only one read and one write 
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command are required for the data to be written to two peripheral 
devices. 

In US-A-2002/0038393 a distributed DMA arrangement is 
disclosed„forL-UseL-within a sys:tem„on_aLjchip^XSoC)_. The DMA 

5 controller units are distributed to various functions modules 

desiring direct memory access. The function modules interface to 
a system bus over which the direct memory access occurs. A 
global buffer memory is coupled to the system bus and bus 
arbitrators are used to arbitrate which function modules have 

10 access to the system bus to perform the direct memory access. 
Once a functional module is selected by the bus arbitrator to 
have access to the system bus, it can establish a DMA routine 
with the global buffer memory* 

OBJECT OF THE INVENTION 

is The object of the present invention is thus to provide 

an improved DMA arrangement overcoming the intrinsic 
disadvantages of the prior art arrangements considered in the 
foregoing. 

SUMMARY OF THE INVENTION 
20 A preferred application of the invention is in 

exchanging data within a direct memory access (DMA) arrangement 
including a plurality of IP blocks. An embodiment -of the 
invention-provides for associating to the IP blocks respective 
DMA modules, each including an input buffer and an output buffer. 
25 These DMA modules are coupled over a data transfer facility in a 
chain arrangement wherein each DMA module, other than the last in 
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the chain, has at least one of its output buffer coupled to the 
input buffer of another DMA modules downstream in the chain and 
each DMA modules, other than the first in the chain, has its 

input__buf ^ er_c oup 1 ed_t o „t he„ou tput„bu f ^ er„ o f _ano t he r DMA modules 

5 upstream in the chain. Each DMA module is caused to interact 
with the respective IP block by writing data from the input 
buffer of the DMA module into the respective IP block and reading 
data from the respective IP block into the output buffer of the 
DMA module. The input and output buffers of the DMA modules are 

io operated in such a way that: - writing of data from the input 

buffer of the DMA module into the respective IP block is started 
when the input buffer is at least partly filled with data; - when 
reading of data from the respective IP block into the output 
buffer of the DMA module is completed, the data in the output 

is buffer of the DMA module are transferred to the input buffer of 
the DMA module downstream in the chain or, in the case of the 
last DMA module in the chain, are provided as output data. 

A particularly preferred embodiment of the invention 
provides for associating to the output buffers and input buffers 

20 coupled in the chain at least one intermediate block to control 
data transfer between the buffers coupled to each other. 
Transfer of data between the coupled buffers over the data 
transfer facility is then controlled by issuing at least one 
request of a requesting buffer for a buffer coupled therewith to 

25 indicate at least one transfer condition out of i) data existing 
to be transferred and ii) enough space existing for receiving 
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said data when transferred. At least one corresponding 
acknowledgement is then issued towards the requesting buffer 
confirming that the at least one transfer condition is met and 

data are tr ansfer r.ed-hetween the_r.equesting-.buf.fer and. the buffer 

5 coupled therewith. The data transfer facility (BUS) between the 
two coupled buffers is thus left free between the request and the 
acknowledgement . 

A CPU may included in the arrangement considered for 
transferring data to be processed into the input buffer of the 

10 first DMA module in the chain, and collecting the output data 

from the output buffer of the last DMA module in the chain. The 
CPU may also be used for configuring the DMA modules. 

The invention also includes architecture of a module 
for implementing the method referred to in the foregoing as well 

is as a computer program product directly loadable into the memory 
of a digital computer and including software code portions for 
carrying out the method of the invention when the product is run 
on a computer. 

A preferred embodiment of the invention (that can be 

20 referred to as "intelligent n direct memory access or I DMA) has 
been developed to provide a reusable wrapper with DMA 
capabilities for hardware IP blocks. The aim is to realize e.g. 
a System On Chip prototyping system structured as a bus system 
where all the IPs are accessible through the system bus. An 

25 embodiment of the I DMA architecture presented herein can also be 
used as the final solution in integrated Systems On Chip. 
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A significant feature of the preferred embodiment of 
the I DMA described herein is the organization of DMA operations 
on the bus. This improves overall performance of a SoC where 

BSSBXAl^JJMh±l£Jbl^ 

5 BRIEF DESCRIPTION OF THE DRAWING 

An embodiment of the invention will now be described by 
way of example only, by referring to the annexed views, in which: 
FIG. 1, has been already described in the foregoing; 
FIG. 2 is a block diagram illustrating a System on Chip 
io architecture for intelligent DMA (IDMA); 

FIG. 3 shows a typical example of System on Chip design 

flow; 

FIG. 4 is a block diagram of an embodiment of IDMA 
module architecture; 
is FIGS. 5 and 6 are further detailed block diagrams of 

parts of the embodiment of FIG. 4; 

FIG. 7 is a flow chart illustrating certain instruction 
flows within an embodiment of the invention; 

FIGS. 8 and 9 are additional detailed block diagrams of 
20 other parts of the embodiment of FIG. 4; 

FIGS. 10 shows a possible corresponding memory 
organization; and 

FIG. 11 shows a specific embodiment of the system of 

FIG. 2. 

25 SPECIFIC DESCRIPTION 
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The exemplary architecture shown in FIG. 2 is a bus- 
based system including a bus (BUS) as a basic data transfer 
facility. The architecture also includes a CPU, a memory block 

MEM plu s a plu ralit y of IP's des ignated A. B. C. Thfese__IPs_are 

5 accessible through the system bus via respective "intelligent" 
DMA modules ( I DMAs) designated IDMA A, IDMA B, and IDMA C, 
respectively. 

Each IDMA module (hereinafter, briefly, IDMA) can be 
described as a respective version of a single VHDL core suitable 
10 to be easily adapted to different IPs by modifying a given set of 
parameters. This is a point that makes IDMA a suitable solution 
for fast prototyping. 

To understand the role of the IDMA in SoC design one 
may refer to FIG. 3 that represents the typical design flow of a 
is SoC. 

Starting from system specification 100 based on 
literature 101 and documents 102, an early system simulation step 
103 is realized in a bit true style e.g. based on a Mat lab 104 or 
C/C++ description 105 to obtain results as close as possible to 

20 the final implementation. 

After this step, a partition 106 must be effected in 
order to identify those modules 107a that will be realized with 
customized hardware parts (generally third party IPs) and those 
modules 107b that will be .implemented as a software routine by 

25 an embedded processor. 
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Generating the hardware modules 107a requires steps 
such as hardware architecture exploration 108 , followed by 
hardware- synthesis -109 and technology mapping 110,. the steps 108 

to_ 3JL0 being_£Larried^ut_-with,IPs both of the "soft" type 112 and 

s the "hard" type 114. 

Similarly, generating the software modules 107b 
requires software description 116, C code translation 118 and a 
microprocessor development system 120. 

The partition 106 must be verified and tested on a 
io HW/SW prototyping step, designated 122 before proceeding, in step 
124 to SoC realization proper. 

Many partitions might be prototyped before defining the 
best one. This may require a very long time if the prototype is 
not fast. 

is In fact one of the critical issues of prototyping (and 

developing) SoC is interfacing HW and SW parts through a system 

bus in a general arrangement as shown in FIG. 1. 

Generally, IPs such as IPs 112 and 114 have very simple 

interfaces that must be adapted to the bus. This affects the 
20 timing and the logical meanings of the signals. Also, the 

schedule of the IP must be controlled and this implies a complex 

activity on the System On Chip controller. 

If the scheduling of different blocks is not properly 

organized, the bus may be congested. Consequently, "interfacing" 
25 a new IP to the system bus may require a very long time in terms 

of new design and interfaces development. 
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The I DMA architecture generally indicated 10 in FIG. 4 
is particularly adapted for overcoming the drawbacks that are 
inherent in prior art arrangements .- -To- that effect , it includes 

input and output-buffers 11 and— 12-_tO— genera teethe input-data to 

s the respective IP (not shown) and receive therefrom corresponding 
IP output data, respectively. 

The buffers 11 and 12 cooperate with a reprogrammable 
FSM- (Finite State Machine) such as a RAM based FSM (briefly RBF) 
13 of a known type that manages the IP controls. The RBF 13 is 
io activated and controlled by a MCU (Main Control Unit) 14. 

References 15 and 16 designate two master blocks, 
designated "master-in" and "master-out" blocks, respectively. 
The master blocks 15 and 16 permit data communication from a 
system bus 17 -preferably patterned after advanced microcontroller 
is bus architecture (AMBA) - to the input buffer 11 and from the 
buffer 12 to the bus 17. Reference 18 designates an AMBA AHB 
(Advanced High- performance Bus) slave interface. 

Reference 19 designates an internal register 
(Instruction Register or IR) interposed between the MCU 14 and 
20 the slave interface 18. 

Finally, reference 20 designates an interrupt 
controller (INT CTRL) activated by the MCU 14 to generate 
interrupts (up to 16, in the embodiment shown) The IDMA 10 has a 
very flexible IP interface that can be easily adapted to 
25 different IPs. This reduces the time needed to connect a new IP. 
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The I DMA 10 can be configured according to the IP 
requirements. This feature eliminates the time needed to build a 

-custom-logic-to-drive- a new IP-, — — — 

The I DMA 1 0-Jia s^dif f er ent-ma s t er-and— s lave-bus 

5 interfaces. This feature eliminates the time required to build a 
particular slave or master interface. Thanks to these features 
the I DMA reduces the time required to set up a new prototype. 

The I DMA architecture 10 is very simple and flexible, 
which makes it particularly suitable to be expanded to 
io accommodate, e.g. new bus and IP interfaces. 

The core description is technology independent; it can 
be used on different prototyping systems and also on the final 
implementation. In the presently preferred embodiment, the core 
was developed in VHDL but it can be exported in a VERILOG 
is project. 

The description will now refer to a situation 
encountered by a designer implementing a design flow as shown in 
FIG. 3. When wishing to define a new partitioning with new IPs, 
in order to connect to the bus through the I DMA 10, the designer 
20 will just need to perform a few operations. 

As a first step, a VHDL wrapper will have to be created 
in order to connect the IP and the I DMA 10. 

To that effect, a suitable set of parameters (e.g. a 
set of twenty integer values) is chosen according to the 
25 application requirements. Preferably, creation of the VHDL 
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wrapper and insertion of the application parameters are performed 
with the support of suitable software tool. 

The IDMA/IP core is then included in the-design, while 

programming the-IDMA run— time and-running the— prototyping-. 

5 emulation* 

In the embodiment shown, a virtual channel is created 

between those blocks that must be directly connected together. 

The blocks may then activate data transfer between them only when 

necessary and without any CPU usage. This feature makes the DMA 
10 shown herein an "intelligent" one. 

The virtual channel is realized by resorting to the two 

buffers 11 and 12 (for the data input to the IP and the data 

output from the IP, respectively) that allow a logical separation 

between the IP and the bus. 
is The DMA characteristics are obtained by means of the 

master block (s) 15 and 16 associated with the input buffer 11 

and/or the output buffer 12. 

This feature leads to the core becoming able to access 

the bus directly; the input and output buffers 11 and 12 are 
20 intended to store data to and from the IP and to optimize access 

to the bus. 

Also, it will be appreciated that providing two 
separate master blocks (masterin 15, masterout 16) within the 
architecture of FIG. 1 may represent a preferred choice in terms 
25 of flexibility. However just one block is usually activated in 
each I DMA. 
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In the fol lowing , the various blocks comprising the 
architecture 10 will be detailed. 

Architecture 10 is generally intended to co-operate 
wi t h_ JLinteLrnal-!L_regi^ters not explicitly shown in„F!G.._4 . Each 
5 of these registers is implemented in a particular module 

according to its function and is accessed through the system bus. 
There are e.g. 33 registers whose width varies between 32 and 1 
bit. Nevertheless they are mapped in the memory plan on 32 bit 
word aligned addresses. Write and read access availability may 
io change according to different implementations for each register. 

The general input -output layout of the input buffer 
(inbuffer) 11 is shown in FIG. 5. 

Preferably, this block is realized as a FIFO memory. 

Input data come from the system bus 17 and output data 
is are sent to the IP input interface. When data are stored in the 
inbuffer the I DMA 10 can download them to the IP while the system 
bus is free. Writing and reading operations can take place 
simultaneously, thus allowing data to be downloaded to the IP 
while the bus 17 is filling the buffer. 
20 A preferred feature of this block is that the input 

data are 32 bit wide while the width of the output to the IP can 
be programmed run time by the user. This implies a significant 
optimization of the bus activities. For instance, if an IP 
requires 64 bits to start processing and has a 1 bit wide port 
25 the user can download all the data with 2 bus cycles in the input 
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buffer 11. Then the IDMA 10 sends one data item per clock cycle 
to the IP while the system bus can be used for other purposes* 

The input buffer 11 is always .accessed through the same 

bu^_^jddress_Cit-.ls_a FIFO meroQry^and_hafi_^always_the_same_ entry 

s point). In this case, the FIFOWR signal is driven high while the 
data is caused to strobe on the FIFODIN port. Before storing the 
data, the total amount of bits to be loaded must be communicated 
to the inbuffer driving high the BTN_VAL signal and driving the 
corresponding value on the BTN port. This allows the input 

io buffer 11 to organize the data when the total amount is not a 

multiple of 32 bits. The writing operation can be driven by the 
masterin module 15 or by external components through the slave 
interface module 18. In the memory organization, the BTN port is 
handled as an internal register. 

is When reading the data the FIFOOE signal must be driven 

high. The data are available through the FIFO DOUT port when the 
DOUTVAL signal is driven high. FIFODOUT is connected directly to 
the IP. The RBF 13 controls FIFOOE and DOUTVAL. 

FIFODIMENSION port and DATA SIZE can be used to 

20 configure the input buffer 11. DATA SIZE indicates how many bits 
must be read at the same time. FIFO DIMENSION is used to 
configure the size of the buffer in terms of 32- bit words. The 
value driven on FIFO DIMENSION cannot exceed the physical size of 
the inbuffer. These values can be configured by external 

25 components through the S LAVE_I NTERF ACE port. FIFODIMENSION and 
DATA SIZE correspond to internal registers. 
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The VALID BIT (VBI) and BITINFIFO ports provide 
information regarding the internal status of the buffer. 

In particular VALIDBIT indicates how many -bits are 

available fox„reading_and^BJTlMEIFXL^ 

5 physically present in the inbuffer 11. 

These values are not always identical because of the 
internal organization of the data. These values can be read by 
the other modules in the I DMA 10 and also by an external 
component through the slave interface module 18. 
lo This feature allows external modules to obtain 

information as to the buffer status and decide whether or not 
download data to the buffer. VALID-BIT and BITINFIFO correspond 
to internal registers. 

The entire contents of the input buffer 11 can be read 
as and written through a RAM port without changing the internal 
status of the FIFO pointers. This feature allows- debug 
operation on the buffer. The RAM port is accessible through the 
slave interface module 18. 

The general input-output layout of the output buffer 
20 (outbuffer) 12 is shown in FIG. 6. 

This block is again realized as a FIFO memory. Input 
data come from the IP and output data are sent to other modules 
through the system bus 17. When data are driven out from the IP 
the IDMA 10 can write them in the output buffer 12 while the bus 
25 is free. Writing and reading operations can occur at the same 
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time allowing data to be downloaded from the IP while the bus 17 
is reading the buffer 12. 



5 output data width is e.g. 32. This implies a significant 

optimization of the bus activities. For instance, if an IP 
provides a 2 -bit wide output, it will require 32 clock cycles to 
provide 64 bits that can be read through the bus in 2 clock 
cycles . 

10 When writing data the FIFOWR signal must be driven high 

while data are driven on the FIFODIN port. FIFODIN is connected 
directly to the IP. The RBF 13 controls FIFOWR. 



address (it is a FIFO memory and has always the same output 
is point). In this case the FIFOOE signal is driven high and data 
are strobed on the next clock cycle. In the embodiment shown, 
the output buffer read operation is always 32 bits wide. The 
reading operation can be driven by the masterout module 16 or by 
external components through the slave interface 18. 
20 F I FODI HENS I ON port and DATA SIZE can be used to 

configure the output buffer 12. DATA SIZE indicates how many 
bits must be read at the same time. FIFO DIMENSION is used to 
configure the size of the buffer in terms of 32- bit words. The 
value driven on FIFO DIMENSION does not exceed the physical 
25 dimension of the output-buffer 12. — These values can be 

configured by external components through the SLAVE- INTERFACE 



An. advantageous feature of this block is— that the input 




can be. programmed run time by-Jthe-jaser_while .the 



The output buffer 12 is read through the same bus 
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port. F I FODIMENS I ON and DATASIZE correspond to internal 
registers. 

-The- VALID BIT (VBO) and BITINFIFO-ports- provide 

inf or mation regarding the internal status of the buffer. — 

5 Specifically, VALIDBIT indicates how many bits are 

available for reading and BITINFIFO indicates how many bits are 
physically present in the output buffer 12. These values are not 
always the same because of the internal organization of the data. 
These values can be read by the other modules in the I DMA 10 as 
10 well as by an external component through the slave interface 

module 18. Again, this feature allows the external modules to 
know the buffer status and decide whether or not download data to 
the buffer. VALIDBIT and BITINFIFO correspond to internal 
registers. 

15 The entire contents of the output buffer 12 can be read 

and written through a RAM port without changing the internal 
status of the FIFO pointers. This feature allows debug operation 
on the buffer. The RAM port is accessible through the slave 
interface module 18. 

20 The module 13 is a finite state machine (FSM) that 

drives operation on the IP. 

Its main role is to take data from the input buffer 11, 
download them in the IP, receive output data from the IP and 
store them in the output buffer 12. 

25 As these operations can vary run time, especially in 

prototyping systems, this FSM must be programmable run time. For 
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this reason it is preferably realized (in a manner known per se) 
with RAM memories that can be written through the system bus. 
Each RAM address contains a description of one finite state, with 

all the, possible-state transitions~and-the-output-~values. - The 

5 RBF 13 is activated and controlled by the MCU 14. When the RBF 
13 flow finishes (if a finish state is defined) the RBF 13 can 
drive high the RBFFINISH to the MCU 14. Any implementation of -a 
reprogrammable FSM can be used. 

In the embodiment shown, the AMBA/AHB slave interface 
10 18 is a standard AMBA slave interface that permits access to the 
internal registers, the RBF 13 and the buffers 11 and 12. It 
provides some control on the address values to verify and give 
error responses if necessary. In the presently preferred 
embodiment, the whole IDMA addressing space is 64 Mbytes. The 
15 base address (IDMA Base Address or IBA) can be changed run time 
or can be fixed. 

The MCU 14 controls the overall functionality of the 
IDMA 10 . It generates commands towards all the other blocks and 
provides external information to the system using interrupts via 
20 the module 20. Preferably, the MCU 14 is realized as a FSM that 
executes instructions. 

The instructions are loaded by the user in the internal 
register 19 called instruction register or IR. 

The core instruction is the GO instruction whose flow 
25 is depicted in FIG. 7. The main purpose of the GO instruction is 
to activate the flow of the RBF 13. 
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Normally the RBP flow enables data transfer from the 
input buffer 11 to the IP and from the IP to the output buffer 
12. When-executing _a GO. instruction .(starting -from a GO RECEIVED 

2DJ&.) the MCU 14 checks_Jbhe status of buf£era_JLl__and_12.. (VBI and 

5 VBO), before and after data processing 201. In the comparison 
steps 202 and 203-204 that follow the start step 200 and the 
processing step 201, the VBI (Valid Bit Inbuffer) and VBO (Valid 
Bit Outbuffer) content's are compared with expected values. These 
values are stored as EBI (Expected Bit Inbuffer) and EBO 
io (Expected Bit Outbuffer) values. 

In particular the RBF flow is enabled, thus leading to 
the processing step 201 via a first interrupt 205 only if VBI 
(Valid Bit Inbuffer), is equal or greater than EBI. 

After the processing step 201, in a step 206 the MCU 14 
is also checks the RBFFINISH signal to ascertain whether the RBF has 
finished its flow. Feedback is given to the system with specific 
interrupts. This allows other modules to. 

access-the-lDMA--buf fers only under particular- condition. The 
complete GO instruction flow is detailed in FIG. 7, where the 
20 steps 207 to 211 designate other interrupts. 

Normally, if all the data have been processed and all 
the output results have been produced, the process should end 
with interrupt 3, that is step 208. 

The table reproduced hereinbelow details the meaning of 
25 the interrupts in question. 
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INTERRUPT 


MEANING 


1 (step 205) 


Data processing has begun as there 
are enough data in the inbuf fer 


2 (step 207) 


No data processing is executed as 
there are not enough data in the 




inbuf fer 


3 (step 208) 


Data processing finished. The 
outbuf fer 12 is full and the inbuffer 
11 is empty, i.e. VBI=0 (step 212) 


4 (step 209) 


Data processing finished. The 
outbuf fer 12 is not full and the 
inbuffer 11 is empty 


5 (step 210) 


Data processing finished. The 
outbuffer 12 is full and the inbuffer 
12 is not empty. 



The "GO" instruction is downloaded into each I DMA from 
the CPU and enables the respective I DMA to perform a single IP 
process. 

10 In order to ensure that the I DMAs are always active 

without CPU commands, each IDMA develops an extension of the "GO" 
instruction ( "GOAL" instruction), that substantially corresponds 
to the flowchart of FIG. 7 with the additional provision of 
return paths leading from each of the (end) interrupts 207,208, 

is 209,210, and 211 back to the input of the comparison step 202. 

When the process is finished, the IDMA 10 always polls 
the VBI value to determine when a new process can start. 

Meanwhile the master in module 15 (or the coupled 
master out block 16 of another IDMA) can store data in the input 

20 buffer 11. Once the data are available a new RBF processing is 
activated. 
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The interrupt controller 20 is a very simple FIFO that 
receives interrupts from the MCU 14. Each interrupt arriving 

from the_MCU~14-_is_stored-in_the_FIFO — 

When_tlie_J^FX>^s^ 

s signals is driven high to be recognized by an external device. 
The user can configure the association between the interrupt 
level and a particular interrupt pin. 

When the external device receives the interrupt it can 
access through the bus the interrupt FIFO 20 to read which 
10 interrupt has been produced. When the interrupt FIFO is empty, 
no interrupt is asserted. It is possible to mask some 
interrupts. In this case the interrupt is not stored in the 
FIFO. 

The basic layout of the masterin block or module 15 is 
is shown in FIG. 8. The programmable masterin block (along with the 
masterout block 16) is the core of the IDMA functionality. Its 
purpose is to upload data through the bus 17 and to download them 
in the input buffer 11. It is activated by the MCU 14 through an 
ENABLE signal. 

20 As better explained in the following, the input buffer 

11 is intended to be (virtually) coupled to the output buffer of 
another IDMA- not shown in FIG. 4 -located "upstream" in the 
general layout of FIG. 2. 

This preferably occurs either via the respective 

25 masterin block 15 or via the masterout block 16 associated to the 
output buffer of the IDMA located "upstream". 
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When coupling is obtained via the respective masterin 
block 15, such mastering block 15 tries to fetch data from such 
coupled output buffer of another I DMA; this occurs only if the 
valid bit in the input buffer 11- is less than the value driven on 
s the REFERENCE VALUE port. This control improves bus occupation 
and system performance. 

The fetch operation is structured in three parts. 

At first, the masterin block- 1-5 -sends a request-to the 
coupled output buffer of another IDMA to know if there are enough 
10 data to load. This request occurs through the bus 17 in one 

clock cycle. The total amount of data to be loaded is determined 
by the value on the BIT TO TRANSFER port. 

Then, the output buffer of the other IDMA being 
questioned sends back an acknowledgement when the data are 
is available. This operation occurs through the bus 17 in one clock 
cycle. Between the request and the acknowledgement the bus is 
kept totally free. 

Finally, when the acknowledgement is asserted, the 
masterin block 15 proceeds to the transfer between the two I DMAs 
20 involved. The process of request /acknowledgement is a 

significant point in creating a virtual channel between two 
I DMAs . 

Thanks to this feature, the control CPU does not have 
to control the IDMA behavior and it can run other procedures. 
25 The basic layout of the masterout block or module 16 is 

shown in FIG. 9. The programmable master out block 16 (along 
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with the master in block 15) is the core of the I DMA 
functionality. Its purpose is to upload data from the output 
buffer 12 and download them through the bus 17. It can access 
other c omponen ts~ i n— t he -sy &t em~only unde r— pa r ti cul a r— c ond i t i on . 

5 It is activated by the MCU 14 through an ENABLE signal. 

As indicated previously coupling between the input 
buffer of an IDMA and the output buffer of another I DMA arranged 
"upstream" in the chain can also be achieved via the masterout 
block 16 associated with such output buffer. 

io In that case, the masterout block 16 in question, when 

enabled, tries to store data in the coupled input buffer of said 
another IDMA 10 only if the valid bits in the output buffer 12 
are more than the value driven on the BIT TO TRANSFER port. This 
control improves bus occupation and system performances. 

is -Again, -the store operation is—structured in three 

parts. 

At first the master out block 16 sends a request to the 
coupled in buffer of the other IDMA to know if there are 
locations enough to store the data. This request occurs through 

20 the bus 17 in one clock cycle. The total amount of data to be 
stored is determined by the value on the BITTOTRANSFER port. 

Then, the input buffer of the other IDMA being 
questioned sends back an acknowledgement when the amount of 
memory locations requested is available. This operation occurs 

25 through the bus 17 in one clock cycle. Between the request and 
the acknowledgement the bus is totally free. 



22 - 



23301 SN 10/535,476 Substitute Specification 

Finally, when the acknowledgement is asserted, the 
masterout block 16 proceeds to the transfer. The process of 
request /acknowledgement just described is a significant point in 

s Thanks to this feature the control CPU does not have to 

control the I DMA behavior and it can run other procedures. 

All the internal memory resources of the IDMA 10 
(internal registers and buffers) are mapped in the memory 
starting from the IBA (Idma Base Address). The IBA is chosen at 
io synthesis time and can be changed if necessary run time. The 
memory plan is preferably as shown in FIG. 10. 

FIG. 11 essentially details the basic scheme of FIG. 2 
by highlighting the presence of input buffers (11A, 11B, 11C) , 
output buffers (12A, 12B, 12C) and masterout block (16A, 16B) in 
is the I DMAs designated IDMA A, IDMA B, and IDMA C. 

There, the IDMA A and IDMA B have respective masterout 
blocks 16A, 16B that couple the associated output buffers 12A, 
12B with the input buffers 11B, 11C of IDMA B and IDMA C. 

Specifically, the output buffer 12A is coupled via the 
20 masterout -block 16A-with-the input buffer 11B and the output 

buffer 12B is coupled via the masterout block 16B with the input 
buffer 11C. 

In such a chain arrangement each IDMA module has: 
a) its output buffer (12A, 12B) coupled to the input 
25 buffer (11B, 11C) of another IDMA module located "downstream" in 
the chain; and/or 
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b) its input buffer (11B, 11C) coupled to the output 
buffer (12A, 12B) of another IDMA module located "upstream" in 

the chain. — 

Specifically, the IDMA A module fulfiXs-only-condition 

5 a); the IDMA B module fulfils both conditions a) and b) ; and the 
IDMA C module fulfils only condition b) . 

Operation of the transmission chain shown in FIG. 11 is 
organized in several steps, and in fact only the three initial 
steps involve the CPU. 
10 At configuration, after system start up, the CPU 

configures all the I DMAs • It downloads the RBF programs and all 
the default values for the internal registers as well as the 
writing address values for the output buffers are initialized to 
couple the master out block 16A with the input buffer 11B and the 
is master out block 16B with the input buffer 11C. 

To activate the I DMAs, a "GOAL" instruction (as 
described in the foregoing-see also FIG. 7) is transmitted to 
every IDMA. 

The CPU transfers the data to be processed into the 
20 input buffer 11A. The RBF 13 in IDMA A begins to transfer data 

to IP A and writes output data from the IP A in the output buffer 
12A. 

As soon as the output buffer 12A is filled with data, 
the masterout block 16A transfers the data to the input buffer 
25 11B : this occurs after performing the request /acknowledgement 
procedure described in the foregoing with the input buffer 11B. 
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As soon as the input buffer 11B is filled with data, 
the RBF 13 in I DMA B transfer the data to IP B and writes -data 
from IP B in the output buffer 12B. 

- As soon-as— the output-buf fer_ 12B_is~f illed with data 

s the master out block 16B transfers the data to the input buffer 
11C. Again, this occurs after a request /acknowledgement 
procedure with the input buffer 11C. 

As soon as the input buffer 11C is filled with data, 
the RBF 13 in I DMA C transfer the data to IP C and writes data 
10 from IP C in output buffer 12C. 

An interrupt is sent to the CPU to indicate that valid 
data are available in the output buffer 12C. 

The system loops through the steps exemplified in the 
foregoing until all the data are processed, 
is It is evident that this is just a possible 

implementation of an IDMA based architecture, which in fact may 
include any number of IDMAs. Several possible variants can be 
easily conceived, such as for instance adding a masterin block to 
IDMA A and/or a master out block to IDMA C. 
20 Of course, without prejudice to the underlying 

principle of the invention, the details and embodiments may vary, 
even significantly, with respect to what has been described by 
way of example only without departing from the scope of the 
invention as defined by the annexed claims. 
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